feat(context): Add Prompt Caching to Code Context (CODY-4807) #6878

julialeex · 2025-01-30T00:14:45Z

Problem (why)
Currently the daily cost and token usage on models is very high. We want to find some ways to reduce them.

Solution (context)
Prompt caching can significantly reduce token costs. Each cache hit reduces costs by 90%, while each cache miss increases costs by 25%.

After some initial analysis, we decide to start with implementing prompt caching for Claude models.

Implementation (what)

Adding the header cache_control: ephemeral, which creates a cache with a 5 min TTL.
Server Side Implementation in this PR

Anthropic Docs (context)

Test plan

Tested locally and cache is being added

...lin/lib/src/main/kotlin/com/sourcegraph/cody/agent/protocol_generated/Cache_controlParams.kt

lib/shared/src/chat/transcript/messages.ts

lib/shared/src/sourcegraph-api/completions/types.ts

vscode/src/edit/prompt/context.ts

This reverts commit 1aa1358.

olafurpg

Blocking comments are 1) undo snapshot/recording changes by not enabling prompt caching in agent integration tests and 2) coming up with a strategy to measure the effectiveness of prompt caching.

Currently, in the diff, prompt caching is unconditionally enabled everywhere and we don't track the impact with telemetry events. My expectation is that we should be able to enable/disable this feature dynamically with feature flags and we should be able to answer questions like how much faster responses are with prompt caching (or what % of tokens are cached).

vscode/src/prompt-builder/utils.ts

vscode/src/edit/prompt/context.ts

olafurpg · 2025-01-31T21:49:42Z

lib/shared/src/sourcegraph-api/clientConfig.ts

@@ -327,3 +332,12 @@ export class ClientConfigSingleton {
        return this.fetchConfigEndpoint(signal, config)
    }
 }
+// It's really complicated to access CodyClientConfig from functions like utils.ts


I think it's worth getting team alignment on what the singleton story should look like for CodyClientConfig. Currently, the interface is designed in a very poor way where it includes information from the server AND custom local settings. I think a cleaner solution is to separate the /.api/client-config settings from locally inferred settings. Going forward, we should stop using the site version as a feature gating signal and exclusively use /.api/client-config as you have done with api-version=7 in this PR.

Not a blocking comment, just want to call out that I'm not 100% happy with how the current code is organized

Ack. Can revisit this problem later.

agent/src/custom-commands.test.ts

agent/src/__snapshots__/custom-commands.test.ts.snap

olafurpg · 2025-01-31T21:57:49Z

I think we disable feature flags by default in agent integration tests, so if you gate caching with a feature flag, then the existing snapshot tests should pass without updating the recordings.

olafurpg

Stamping since I will be afk until Tuesday and we are targeting this for Wednesday. I recommend getting another approval on the PR once you have addressed the comments above.

julialeex added 2 commits January 29, 2025 16:12

add prompt caching

bf92d5b

fix

3fb5a24

julialeex requested a review from olafurpg January 30, 2025 00:18

julialeex added 4 commits January 29, 2025 16:19

code cleanup

f6eb866

update recordings

e3715f0

fix typo

c93fac5

fix typo (jb)

3092a32

julialeex commented Jan 30, 2025

View reviewed changes

...lin/lib/src/main/kotlin/com/sourcegraph/cody/agent/protocol_generated/Cache_controlParams.kt Outdated Show resolved Hide resolved

olafurpg requested changes Jan 30, 2025

View reviewed changes

lib/shared/src/chat/transcript/messages.ts Outdated Show resolved Hide resolved

lib/shared/src/sourcegraph-api/completions/types.ts Outdated Show resolved Hide resolved

vscode/src/edit/prompt/context.ts Outdated Show resolved Hide resolved

julialeex added 3 commits January 30, 2025 13:53

update text to content

1aa1358

api version

3f96bd5

Revert "update text to content"

d620418

This reverts commit 1aa1358.

julialeex requested a review from olafurpg January 31, 2025 02:29

olafurpg requested changes Jan 31, 2025

View reviewed changes

olafurpg approved these changes Jan 31, 2025

View reviewed changes

julialeex added 5 commits January 31, 2025 14:18

make cache_enabled optional

46b6616

dependencies

7ea8122

dependency

19153b6

update recordings

d77e111

remove pnpm-lock.yaml changes

7cfcae7

julialeex changed the title ~~feat/cody: Add Prompt Caching to Code Context (CODY-4807)~~ feat(context): Add Prompt Caching to Code Context (CODY-4807) Jan 31, 2025

Merge branch 'main' into jlxu/cache_control

29da64c

julialeex merged commit f32989a into main Feb 1, 2025
21 checks passed

julialeex deleted the jlxu/cache_control branch February 1, 2025 00:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(context): Add Prompt Caching to Code Context (CODY-4807) #6878

feat(context): Add Prompt Caching to Code Context (CODY-4807) #6878

julialeex commented Jan 30, 2025 •

edited

Loading

olafurpg left a comment

olafurpg Jan 31, 2025

olafurpg Jan 31, 2025

julialeex Jan 31, 2025 •

edited

Loading

olafurpg commented Jan 31, 2025

olafurpg left a comment

feat(context): Add Prompt Caching to Code Context (CODY-4807) #6878

feat(context): Add Prompt Caching to Code Context (CODY-4807) #6878

Conversation

julialeex commented Jan 30, 2025 • edited Loading

Test plan

olafurpg left a comment

Choose a reason for hiding this comment

olafurpg Jan 31, 2025

Choose a reason for hiding this comment

olafurpg Jan 31, 2025

Choose a reason for hiding this comment

julialeex Jan 31, 2025 • edited Loading

Choose a reason for hiding this comment

olafurpg commented Jan 31, 2025

olafurpg left a comment

Choose a reason for hiding this comment

julialeex commented Jan 30, 2025 •

edited

Loading

julialeex Jan 31, 2025 •

edited

Loading